feat: PDF ingestion via unpdf with page chunking by RaviTharuma · Pull Request #33 · AVIDS2/memorix

RaviTharuma · 2026-03-29T22:27:07Z

Summary

Adds an memorix_ingest_pdf MCP tool that extracts text from PDF files using unpdf and stores each page (or chunk) as a memorix observation.

What's included

src/multimodal/pdf-loader.ts — PDF text extractor with:
- Page-level chunking (one observation per page by default)
- Configurable max pages
- Graceful handling of empty/corrupted PDFs
- Optional unpdf dependency (clear error if not installed)
src/server.ts — MCP tool registration (memorix_ingest_pdf)
tests/multimodal/pdf-loader.test.ts — 4 tests covering:
- Multi-page extraction and observation storage
- Empty PDF handling
- Page limit configuration
- Missing file error handling

Design decisions

unpdf is an optional peer dependency — tool returns clear error if not installed (npm install unpdf).
Each page becomes a separate observation for granular search/retrieval.
Stores as how-it-works observation type with PDF metadata (page number, total pages, filename) in facts.
No heavy native dependencies — unpdf is pure JS/WASM.

Tests

4 pass, 0 fail

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0e39c413c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T22:32:16Z

src/multimodal/pdf-loader.ts

+ */
+export async function extractPdfText(input: PdfInput): Promise<PdfExtractionResult> {
+  // Dynamic import — unpdf is optional
+  const unpdf = await import('unpdf').catch(() => null) as {


Declare unpdf as an optional dependency

This new loader hard-depends on resolving import('unpdf'), but the package is not declared in Memorix’s dependency metadata, so default installs won’t provide it. In the documented global-install flow (npm install -g memorix), telling users to run local npm install unpdf still won’t make this import resolvable from the global package tree, so memorix_ingest_pdf will consistently fail for those users.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-29T22:32:16Z

src/multimodal/pdf-loader.ts

+  const limit = Math.min(rawPages.length, maxPages);
+
+  for (let i = 0; i < limit; i++) {


Enforce integer maxPages before page loop

maxPages is treated as an arbitrary number and used directly in i < limit, which causes fractional values to round up in practice (e.g., maxPages=1.1 processes pages 1 and 2). Negative values also silently result in zero ingestion. Because this tool parameter represents a page count, it should be validated/coerced to a positive integer before computing the loop bound.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-29T22:32:16Z

tests/multimodal/pdf-loader.test.ts

@@ -0,0 +1,32 @@
+import { describe, it, expect } from 'bun:test';


Switch new test file to Vitest imports

The repository’s test runner is Vitest (npm test runs vitest run), but this file imports from bun:test. In the Node/Vitest environment this module is not available, so the new test suite cannot execute as part of normal CI/test runs.

Useful? React with 👍 / 👎.

AVIDS2 · 2026-03-30T13:36:24Z

Thanks for working on PDF ingestion — page-level storage is a good direction, but this branch still has a few hard blockers.

The test file is using bun:test, while the repo runs vitest in CI.
src/multimodal/pdf-loader.ts currently fails typecheck because unpdf is dynamically imported but not represented in a way TypeScript can resolve in this repo's current dependency/type setup.
In src/server.ts, the object flowing into storeObservation is inferred as type: string rather than a proper ObservationType.

So at the moment this is not just a dependency-install issue — it needs one more pass to make the optional dependency story and the repo's type/test expectations line up.

feat: PDF ingestion via unpdf with page chunking (memorix-nmi)

b0e39c4

chatgpt-codex-connector bot reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PDF ingestion via unpdf with page chunking#33

feat: PDF ingestion via unpdf with page chunking#33
RaviTharuma wants to merge 1 commit intoAVIDS2:mainfrom
RaviTharuma:feature/memorix-nmi-pdf-ingestion

RaviTharuma commented Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

AVIDS2 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const limit = Math.min(rawPages.length, maxPages);

		for (let i = 0; i < limit; i++) {

		@@ -0,0 +1,32 @@
		import { describe, it, expect } from 'bun:test';

Conversation

RaviTharuma commented Mar 29, 2026

Summary

What's included

Design decisions

Tests

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

AVIDS2 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants